Skip to content

[improvement](cgroup) inactive_file should be treated as available memory to avoid query be cancelled#64347

Merged
yiguolei merged 3 commits into
apache:masterfrom
yiguolei:fix_mem
Jun 11, 2026
Merged

[improvement](cgroup) inactive_file should be treated as available memory to avoid query be cancelled#64347
yiguolei merged 3 commits into
apache:masterfrom
yiguolei:fix_mem

Conversation

@yiguolei

@yiguolei yiguolei commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Sometimes we may see this errors in cgroup or k8s environment. Allocator sys memory check failed: Cannot alloc:5343, ...,process memory used 85.41 GB exceed limit 108.00 GB or sys available memory 5.88 GB less than low water mark 6.00 GB.
The mem_limit term is false (85.41 < 108). The 5343-byte allocation is rejected only by sys available memory 5.88 GB < low water mark 6.00 GB. 5.88 GiB available implies cgroup_mem_usage of about 114 GiB, roughly 29 GiB above process memory used (85.41 GiB); that gap is unmapped read page cache. The kernel reclaims clean page cache before OOM, so the memory is available, but Doris cannot reclaim it and the rejection repeats on later allocations. (low water mark 6.00 GB is the default: min(120 - 108, 120 * 5%) = 6.)

Before this PR, cgroup_mem_usage = memory.current - inactive_file - slab_reclaimable. So some active files page cache is not treated as recycleable memory. So cgroup_mem_usage is a bit larger than RSS.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yiguolei

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.39% (1915/2443)
Line Coverage 64.88% (34210/52725)
Region Coverage 65.30% (17598/26948)
Branch Coverage 53.97% (9346/17316)

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29737 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0245a280ecd348cee7772ef9e9af09472a2b38f8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17658	4377	4322	4322
q2	q3	10739	1386	878	878
q4	4681	496	360	360
q5	7569	867	615	615
q6	188	184	145	145
q7	793	874	634	634
q8	9552	1583	1657	1583
q9	6461	4497	4479	4479
q10	6832	1830	1545	1545
q11	448	282	258	258
q12	671	431	311	311
q13	18161	3557	2752	2752
q14	291	268	257	257
q15	q16	832	789	722	722
q17	1507	1133	820	820
q18	7049	5788	5609	5609
q19	1353	1383	1113	1113
q20	527	406	267	267
q21	6205	2806	2728	2728
q22	487	395	339	339
Total cold run time: 102004 ms
Total hot run time: 29737 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5251	5096	5094	5094
q2	q3	5135	5374	4778	4778
q4	2402	2464	1581	1581
q5	5100	5192	4888	4888
q6	258	200	136	136
q7	2039	1865	1771	1771
q8	2750	2300	2133	2133
q9	7668	7725	7639	7639
q10	4915	4894	4366	4366
q11	611	430	399	399
q12	815	812	600	600
q13	3046	3565	2826	2826
q14	281	286	261	261
q15	q16	722	738	643	643
q17	1323	1308	1285	1285
q18	7826	7080	7155	7080
q19	1142	1107	1110	1107
q20	2276	2279	2008	2008
q21	5689	4988	4883	4883
q22	553	509	398	398
Total cold run time: 59802 ms
Total hot run time: 53876 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169589 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0245a280ecd348cee7772ef9e9af09472a2b38f8, data reload: false

query5	4326	646	480	480
query6	450	197	181	181
query7	4817	575	306	306
query8	367	215	199	199
query9	8774	4085	4053	4053
query10	438	349	258	258
query11	5935	2356	2151	2151
query12	156	102	106	102
query13	1316	596	428	428
query14	6426	5459	5102	5102
query14_1	4439	4458	4423	4423
query15	215	201	175	175
query16	1046	472	432	432
query17	1154	729	597	597
query18	2719	489	355	355
query19	218	188	147	147
query20	115	117	120	117
query21	229	149	121	121
query22	13665	13588	13465	13465
query23	17438	16400	16177	16177
query23_1	16195	16344	16452	16344
query24	7390	1785	1341	1341
query24_1	1319	1320	1329	1320
query25	556	435	412	412
query26	1081	316	164	164
query27	2658	544	336	336
query28	4435	2013	2029	2013
query29	1031	657	485	485
query30	314	241	198	198
query31	1142	1082	950	950
query32	112	61	59	59
query33	521	305	248	248
query34	1180	1147	662	662
query35	764	779	691	691
query36	1366	1430	1212	1212
query37	157	112	96	96
query38	3207	3190	3055	3055
query39	937	934	917	917
query39_1	876	878	898	878
query40	213	140	115	115
query41	71	68	71	68
query42	106	101	95	95
query43	335	340	294	294
query44	
query45	197	187	179	179
query46	1087	1209	769	769
query47	2317	2383	2221	2221
query48	391	396	305	305
query49	627	465	351	351
query50	970	368	263	263
query51	4424	4314	4250	4250
query52	88	91	88	88
query53	256	274	188	188
query54	289	212	204	204
query55	79	82	73	73
query56	264	218	212	212
query57	1449	1405	1332	1332
query58	246	216	215	215
query59	1621	1674	1413	1413
query60	280	243	232	232
query61	161	152	160	152
query62	731	657	582	582
query63	235	193	189	189
query64	2214	805	614	614
query65	
query66	1701	464	346	346
query67	29711	29776	29600	29600
query68	
query69	421	310	266	266
query70	988	927	913	913
query71	302	228	217	217
query72	3039	2694	2392	2392
query73	844	791	432	432
query74	5142	5004	4794	4794
query75	2662	2575	2226	2226
query76	2309	1197	796	796
query77	360	367	297	297
query78	12356	12305	11849	11849
query79	1418	1040	786	786
query80	600	475	410	410
query81	458	289	253	253
query82	613	158	124	124
query83	357	279	248	248
query84	
query85	885	527	436	436
query86	363	316	287	287
query87	3389	3419	3209	3209
query88	3646	2762	2737	2737
query89	413	392	331	331
query90	1956	190	189	189
query91	179	171	173	171
query92	64	63	58	58
query93	1518	1398	913	913
query94	566	364	297	297
query95	680	397	458	397
query96	1020	772	330	330
query97	2707	2701	2553	2553
query98	210	211	214	211
query99	1158	1178	1059	1059
Total cold run time: 250517 ms
Total hot run time: 169589 ms

@yiguolei yiguolei added dev/4.1.x usercase Important user case type label dev/4.0.x labels Jun 10, 2026
@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 16.00% (4/25) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.76% (28261/38315)
Line Coverage 57.78% (307588/532307)
Region Coverage 54.61% (257553/471633)
Branch Coverage 55.97% (111723/199627)

gavinchou
gavinchou previously approved these changes Jun 10, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 10, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@wyxxxcat wyxxxcat left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wyxxxcat

Copy link
Copy Markdown
Collaborator

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two cgroup-v2 accounting issues that should be fixed before merge.

Critical checkpoint conclusions:

  • Goal: the PR tries to treat active file cache as reclaimable so cgroup memory checks do not reject allocations too early. The goal is only partially met because the new BE accounting can now warn continuously for valid zero counters and can fail refreshes on non-atomic cgroup snapshots.
  • Scope: the code change is small and focused, with matching BE and cloud paths for the main active_file subtraction.
  • Concurrency/lifecycle: no new shared-state locking or lifecycle ownership was added. However, the cgroup files are read as separate external snapshots while memory/cache state can change concurrently; that race is not handled safely in BE.
  • Config/compatibility/persistence/data writes: no new config items, protocol/storage compatibility changes, transaction changes, or data-write paths are involved.
  • Parallel paths: BE and cloud both subtract active_file; the blocking issues are in the BE validation/logging path.
  • Tests: no dedicated test was added for active_file subtraction, zero-valued metrics, or the negative adjusted-usage edge. Existing PR checks show Linux BE and cloud tests passing; the macOS BE UT failure is an environment JDK-17 mismatch, not a code failure.
  • Observability/performance: the new warning can become high-frequency log spam on normal zero file-cache metrics.

User focus: no additional user-provided review focus was supplied.

Comment thread be/src/common/cgroup_memory_ctl.cpp Outdated
int64_t active_file = metrics_map.contains("active_file") ? metrics_map["active_file"] : 0;
int64_t slab_reclaimable =
metrics_map.contains("slab_reclaimable") ? metrics_map["slab_reclaimable"] : 0;
if (inactive_file < 1 || active_file < 1 || slab_reclaimable < 1) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These counters can legitimately be zero; that is different from the metric being absent. For example, a cgroup with only anonymous memory and no file-cache pages reports inactive_file 0 and active_file 0, but refresh_proc_meminfo() calls this path every memory_maintenance_sleep_time_ms (50 ms by default), so this warning would be emitted continuously. Please check key presence separately from the value, and rate-limit any warning for truly missing metrics.

// the reason why we subtract inactive_file described here:

const int64_t reclaimable_usage = inactive_file + active_file + slab_reclaimable;
if (*value < reclaimable_usage) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory.current and memory.stat are read in separate syscalls, so this sum is not an atomic snapshot. With concurrent page-cache growth, memory.current can be from before the growth while active_file + inactive_file + slab_reclaimable is from after it; returning CgroupError makes MemInfo::refresh_proc_meminfo() drop the cgroup refresh state and temporarily recompute physical/available memory from host /proc/meminfo instead of the cgroup limit. Please handle the race by clamping/retrying rather than treating it as a fatal invariant.

@yiguolei

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Jun 11, 2026
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28728 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6b32cfe10e32b70c43722695fbbdf565e99b34e0, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17652	4075	3983	3983
q2	q3	10775	1363	807	807
q4	4682	459	333	333
q5	7533	865	578	578
q6	182	169	137	137
q7	764	853	617	617
q8	9468	1595	1531	1531
q9	6518	4489	4458	4458
q10	6810	1821	1522	1522
q11	443	263	249	249
q12	640	431	297	297
q13	18155	3327	2816	2816
q14	282	255	241	241
q15	q16	815	795	706	706
q17	1344	1014	757	757
q18	6693	5848	5538	5538
q19	1532	1403	1047	1047
q20	514	403	266	266
q21	6118	2704	2538	2538
q22	448	369	307	307
Total cold run time: 101368 ms
Total hot run time: 28728 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4910	4725	4742	4725
q2	q3	5123	5221	4643	4643
q4	2175	2178	1388	1388
q5	4924	4834	4739	4739
q6	234	182	133	133
q7	1850	1709	1537	1537
q8	2495	2095	1967	1967
q9	7467	7402	7469	7402
q10	4736	4634	4207	4207
q11	531	386	348	348
q12	717	734	542	542
q13	3042	3347	2848	2848
q14	267	275	254	254
q15	q16	677	696	615	615
q17	1274	1249	1245	1245
q18	7250	6790	6782	6782
q19	1116	1077	1116	1077
q20	2221	2228	1957	1957
q21	5321	4572	4389	4389
q22	527	443	410	410
Total cold run time: 56857 ms
Total hot run time: 51208 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169012 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6b32cfe10e32b70c43722695fbbdf565e99b34e0, data reload: false

query5	4308	614	489	489
query6	436	191	181	181
query7	4953	533	288	288
query8	360	219	205	205
query9	8768	4070	4053	4053
query10	433	302	246	246
query11	5899	2351	2195	2195
query12	151	104	94	94
query13	1272	630	438	438
query14	6397	5388	5095	5095
query14_1	4397	4376	4409	4376
query15	207	202	175	175
query16	989	454	356	356
query17	1131	723	559	559
query18	2547	461	339	339
query19	196	178	136	136
query20	110	105	108	105
query21	215	139	117	117
query22	13804	13572	13565	13565
query23	17374	16525	16229	16229
query23_1	16273	16273	16387	16273
query24	7601	1756	1318	1318
query24_1	1297	1306	1306	1306
query25	529	437	375	375
query26	1303	323	177	177
query27	2685	587	339	339
query28	4442	2010	2039	2010
query29	1071	623	470	470
query30	312	232	197	197
query31	1107	1076	944	944
query32	110	61	55	55
query33	519	329	258	258
query34	1194	1107	655	655
query35	739	801	660	660
query36	1377	1373	1225	1225
query37	159	103	87	87
query38	3228	3153	3095	3095
query39	934	923	873	873
query39_1	867	891	881	881
query40	216	120	97	97
query41	66	61	60	60
query42	96	94	90	90
query43	317	319	279	279
query44	
query45	194	192	191	191
query46	1145	1230	710	710
query47	2390	2442	2207	2207
query48	375	437	309	309
query49	628	481	361	361
query50	1008	374	262	262
query51	4494	4333	4217	4217
query52	89	92	83	83
query53	257	271	192	192
query54	273	244	225	225
query55	80	77	73	73
query56	251	243	242	242
query57	1425	1432	1312	1312
query58	249	220	222	220
query59	1563	1651	1406	1406
query60	289	252	240	240
query61	174	169	164	164
query62	698	654	590	590
query63	242	190	186	186
query64	2603	800	660	660
query65	
query66	1787	473	390	390
query67	29741	29795	29641	29641
query68	
query69	410	293	264	264
query70	993	962	948	948
query71	295	220	208	208
query72	3008	2602	2324	2324
query73	837	760	406	406
query74	5112	4975	4779	4779
query75	2634	2564	2206	2206
query76	2317	1146	809	809
query77	356	381	296	296
query78	12346	12430	11855	11855
query79	1431	1036	773	773
query80	602	454	394	394
query81	453	275	249	249
query82	574	162	117	117
query83	350	272	245	245
query84	
query85	859	505	409	409
query86	359	293	284	284
query87	3406	3378	3180	3180
query88	3642	2764	2698	2698
query89	429	389	328	328
query90	1969	182	182	182
query91	168	157	131	131
query92	63	62	53	53
query93	1539	1597	901	901
query94	548	336	315	315
query95	708	466	340	340
query96	1049	756	345	345
query97	2666	2744	2580	2580
query98	217	206	205	205
query99	1134	1195	1020	1020
Total cold run time: 251257 ms
Total hot run time: 169012 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 13.79% (4/29) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.97% (28333/38305)
Line Coverage 57.94% (308495/532435)
Region Coverage 54.67% (257913/471732)
Branch Coverage 56.14% (112121/199699)

@yiguolei

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for 6b32cfe.

I reviewed the full PR diff and the existing inline review threads before proposing any new issue. I did not find an additional non-duplicate blocking issue to raise.

Checkpoint conclusions:

  • Goal: the PR changes cgroup v2 adjusted memory usage to subtract active_file in BE and cloud meta-service rate-limit accounting. The current code implements that formula.
  • Scope: the change is small and focused on cgroup memory accounting plus explanatory comments.
  • Concurrency/lifecycle: the BE path is periodically refreshed by the daemon and reads cgroup files via separate syscalls; the previously raised race around memory.current versus memory.stat is already covered by the existing inline thread, so I did not duplicate it.
  • Configuration/compatibility: no new config, protocol, storage, transaction, or persistence compatibility concerns.
  • Parallel paths: BE and cloud meta-service both received the active_file formula change.
  • Error handling/observability: no ignored Status was introduced in the changed code. Existing comments already cover the cgroup metric/race handling concerns.
  • Tests: no direct new deterministic test was added for the active_file accounting or clamp path. Existing TeamCity BE UT and Cloud UT statuses are successful; GitHub macOS BE UT failed before tests due runner JDK 25 instead of required JDK 17, and the TeamCity coverage merge status is failing.
  • User focus: .code-review.QXaRV6/review_focus.txt contains no additional focus points.

No new inline comments submitted to avoid duplicating the already-known review threads.

@yiguolei

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for 6b32cfe.

I reviewed the full PR diff, the repository code-review checklist, the BE common and Cloud meta-service module guides, and the existing inline review threads before proposing any issue. I did not find an additional non-duplicate blocking issue to raise.

Critical checkpoint conclusions:

  • Goal: the PR changes cgroup v2 adjusted memory usage to count active_file as reclaimable in both BE memory accounting and Cloud meta-service stress-rate-limit accounting. The implementation applies that formula in both paths.
  • Tests: I did not see a direct new deterministic test for the active_file formula or cgroup-stat edge cases. Current PR status shows compile, BE UT, Cloud UT, FE UT, and regression checks passing; GitHub macOS BE UT is failing before meaningful validation due the runner JDK mismatch, and TeamCity coverage merge is failing.
  • Scope: the patch is small and focused on cgroup memory accounting plus comments.
  • Concurrency/lifecycle: no new shared ownership or locking was added. BE still reads memory.current and memory.stat as separate external snapshots; that race and the related clamp/log behavior are already covered by the existing inline review thread, so I did not duplicate it.
  • Configuration/dynamic config: no new config item was added.
  • Compatibility/persistence/transactions/data writes: no FE-BE protocol, storage format, EditLog, transaction, or data-write path is involved.
  • Parallel paths: BE cgroup v2 and Cloud meta-service cgroup v2 accounting were both updated; cgroup v1 continues to use RSS-only accounting.
  • Special conditions/error handling: no new ignored Status was introduced in the changed code. The already-known metric-presence and non-atomic snapshot concerns are represented by existing inline threads.
  • Observability/performance: no new hot-path allocation or lock issue found in this small patch; any warning behavior around the cgroup snapshot edge is already part of the existing review context.
  • User focus: .code-review.RZf5IY/review_focus.txt contains no additional focus points, so there was no extra focused issue to report.

I also ran git diff --check against the provided base/head range; it reported no whitespace errors.

@yiguolei

Copy link
Copy Markdown
Contributor Author

skip check_coverage

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei merged commit 92f800a into apache:master Jun 11, 2026
32 of 33 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 11, 2026
…mory to avoid query be cancelled (#64347)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Sometimes we may see this errors in cgroup or k8s environment. Allocator
sys memory check failed: Cannot alloc:5343, ...,process memory used
85.41 GB exceed limit 108.00 GB or sys available memory 5.88 GB less
than low water mark 6.00 GB.
The mem_limit term is false (85.41 < 108). The 5343-byte allocation is
rejected only by sys available memory 5.88 GB < low water mark 6.00 GB.
5.88 GiB available implies cgroup_mem_usage of about 114 GiB, roughly 29
GiB above process memory used (85.41 GiB); that gap is unmapped read
page cache. The kernel reclaims clean page cache before OOM, so the
memory is available, but Doris cannot reclaim it and the rejection
repeats on later allocations. (low water mark 6.00 GB is the default:
min(120 - 108, 120 * 5%) = 6.)

Before this PR, cgroup_mem_usage = memory.current - inactive_file -
slab_reclaimable. So some active files page cache is not treated as
recycleable memory. So cgroup_mem_usage is a bit larger than RSS.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei added a commit to yiguolei/incubator-doris that referenced this pull request Jun 11, 2026
…mory to avoid query be cancelled (apache#64347)

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Sometimes we may see this errors in cgroup or k8s environment. Allocator
sys memory check failed: Cannot alloc:5343, ...,process memory used
85.41 GB exceed limit 108.00 GB or sys available memory 5.88 GB less
than low water mark 6.00 GB.
The mem_limit term is false (85.41 < 108). The 5343-byte allocation is
rejected only by sys available memory 5.88 GB < low water mark 6.00 GB.
5.88 GiB available implies cgroup_mem_usage of about 114 GiB, roughly 29
GiB above process memory used (85.41 GiB); that gap is unmapped read
page cache. The kernel reclaims clean page cache before OOM, so the
memory is available, but Doris cannot reclaim it and the rejection
repeats on later allocations. (low water mark 6.00 GB is the default:
min(120 - 108, 120 * 5%) = 6.)

Before this PR, cgroup_mem_usage = memory.current - inactive_file -
slab_reclaimable. So some active files page cache is not treated as
recycleable memory. So cgroup_mem_usage is a bit larger than RSS.

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.x reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants